replace xml.NewEncoder with xml.EscapeText by artur-chopikian · Pull Request #2100 · qax-os/excelize

artur-chopikian · 2025-03-06T10:29:57Z

PR Details

Memory allocations

Description

xml.NewEncoder uses bufio.NewWriter, which allocates 4096 bytes to every call (every sell with text in the xlsx, you can imagine how much it can be).

const (
	defaultBufSize = 4096
)

func NewWriter(w io.Writer) *Writer {
	return NewWriterSize(w, defaultBufSize)
}

And this xml.EscapeText shows new lines properly in the xlsx file.

Types of changes

Docs change / refactoring / dependency upgrade
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

artur-chopikian · 2025-03-06T10:36:00Z

@xuri, please take a look at this. I hope we can roll back this change

The commit where this change was added: 9999221

codecov · 2025-03-06T10:43:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.20%. Comparing base (aef20e2) to head (271c282).
Report is 5 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2100   +/-   ##
=======================================
  Coverage   99.20%   99.20%           
=======================================
  Files          32       32           
  Lines       30096    30102    +6     
=======================================
+ Hits        29858    29864    +6     
  Misses        158      158           
  Partials       80       80

Flag	Coverage Δ
unittests	`99.20% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xuri

This change will cause xml:space="preserve" attribute of t element missing. The \n new line will doesn't work.

Before:

<c r="A1" s="1" t="inlineStr">
    <is>
        <t xml:space="preserve">text
</t>
    </is>
</c>

After this PR change:

<c r="A1" s="1" t="inlineStr">
    <is>
        <t>text&#xA;</t>
    </is>
</c>

For example:

package main

import (
	"fmt"

	"github.com/xuri/excelize/v2"
)

func main() {
	f := excelize.NewFile()
	defer func() {
		if err := f.Close(); err != nil {
			fmt.Println(err)
		}
	}()
	sw, err := f.NewStreamWriter("Sheet1")
	if err != nil {
		fmt.Println(err)
		return
	}
	styleID, err := f.NewStyle(&excelize.Style{
		Alignment: &excelize.Alignment{WrapText: true},
	})
	if err != nil {
		fmt.Println(err)
		return
	}
	if err := sw.SetRow("A1", []interface{}{excelize.Cell{Value: "text\n", StyleID: styleID}}); err != nil {
		fmt.Println(err)
		return
	}
	if err := sw.Flush(); err != nil {
		fmt.Println(err)
		return
	}
	if err = f.SaveAs("Book1.xlsx"); err != nil {
		fmt.Println(err)
	}
}

This change will caused no-new line after A1 cell value text:

text

After this PR change:

text

So, I don't think we need to roll back the change 9999221.

artur-chopikian · 2025-03-07T14:41:56Z

@xuri Thanks, I got it! Then I do not see another way like copy this small method and make it work as we expect it.

artur-chopikian · 2025-03-07T14:44:11Z

@xuri Or what if we check it before? Can you imagine some problem that can cause it?

// trimCellValue provides a function to set string type to cell.
func trimCellValue(value string, escape bool) (v string, ns xml.Attr) {
	if utf8.RuneCountInString(value) > TotalCellChars {
		value = string([]rune(value)[:TotalCellChars])
	}
	if value != "" {
		prefix, suffix := value[0], value[len(value)-1]
		for _, ascii := range []byte{9, 10, 13, 32} {
			if prefix == ascii || suffix == ascii {
				ns = xml.Attr{
					Name:  xml.Name{Space: NameSpaceXML, Local: "space"},
					Value: "preserve",
				}
				break
			}
		}

		if escape {
			var buf bytes.Buffer
			_ = xml.EscapeText(&buf, []byte(value))
			value = buf.String()
		}
	}
	v = bstrMarshal(value)
	return
}

And we have this one

<c r="A1" s="1" t="inlineStr">
    <is>
        <t xml:space="preserve">text&#xA;</t>
    </is>
</c>

xuri

Yeah, your lasted change will escape \n in different way:

Before:

<c r="A1" s="1" t="inlineStr">
    <is>
        <t xml:space="preserve">text
</t>
    </is>
</c>

After this PR change:

<c r="A1" s="1" t="inlineStr">
    <is>
        <t xml:space="preserve">text&#xA;</t>
    </is>
</c>

This change will caused no-new line after A1 cell value text in Windows Office 2007, but works on Windows Office 2010, Excel for Mac.

artur-chopikian · 2025-03-07T17:53:21Z

What about others? I think we also have a problem with those symbols because we will replace them with:

\t -> &#x9;
\r -> &#xD;

xuri · 2025-03-08T06:10:00Z

The xml.EscapeText will not transform \t to 	, it could be works in all version Excel applications.

The \r symbol cannot be used to add a new line in the cell, so it may not function correctly in all versions of Excel.

Therefore, I suggest maintaining the current trimCellValue code for better compatibility.

xuri

Thanks for your PR. Any benchmark data on the performance impact of using xml.EscapeText instead of xml.NewEncoder? Specifically, how much memory is saved, and what percentage of speed improvement can be expected? I don't recommend copying code from the standard library; if necessary, it would be better to submit a patch to improve the Go standard library directly.

artur-chopikian · 2025-03-21T19:12:36Z

Hi! The file contains around 80k lines

github.com/xuri/excelize/v2 v2.8.1

         .          .    502:func trimCellValue(value string, escape bool) (v string, ns xml.Attr) {
         .          .    503:   if utf8.RuneCountInString(value) > TotalCellChars {
         .          .    504:           value = string([]rune(value)[:TotalCellChars])
         .          .    505:   }
         .          .    506:   if escape {
   56.50MB    56.50MB    507:           var buf bytes.Buffer
      14MB       49MB    508:           _ = xml.EscapeText(&buf, []byte(value))
         .     7.50MB    509:           value = buf.String()
         .          .    510:   }
         .          .    511:   if len(value) > 0 {
         .          .    512:           prefix, suffix := value[0], value[len(value)-1]
         .          .    513:           for _, ascii := range []byte{9, 10, 13, 32} {
         .          .    514:                   if prefix == ascii || suffix == ascii {

github.com/xuri/excelize/v2 v2.9.0

         .          .    509:func trimCellValue(value string, escape bool) (v string, ns xml.Attr) {
         .          .    510:   if utf8.RuneCountInString(value) > TotalCellChars {
         .          .    511:           value = string([]rune(value)[:TotalCellChars])
         .          .    512:   }
         .          .    513:   if escape {
      69MB       69MB    514:           var buf bytes.Buffer
         .     6.57GB    515:           enc := xml.NewEncoder(&buf)
      11MB       11MB    516:           _ = enc.EncodeToken(xml.CharData(value))
         .       55MB    517:           enc.Flush()
         .       15MB    518:           value = buf.String()
         .          .    519:   }
         .          .    520:   if len(value) > 0 {
         .          .    521:           prefix, suffix := value[0], value[len(value)-1]
         .          .    522:           for _, ascii := range []byte{9, 10, 13, 32} {
         .          .    523:                   if prefix == ascii || suffix == ascii {
         .          .    524:                           ns = xml.Attr{
         .          .    525:                                   Name:  xml.Name{Space: NameSpaceXML, Local: "space"},
         .          .    526:                                   Value: "preserve",
         .          .    527:                           }
         .          .    528:                           break
         .          .    529:                   }
         .          .    530:           }
         .          .    531:   }
         .     9.27MB    532:   v = bstrMarshal(value)
         .          .    533:   return
         .          .    534:}
         .          .    535:
         .          .    536:// setCellValue set cell data type and value for (inline) rich string cell or
         .          .    537:// formula cell.

Speed is the same because the function under the hood is familiar, but the xml.NewEncoder creates a buffer of 4096 bytes for each run while xml.EscapeText uses an empty buffer.

It is around 119 times bigger than was. I would say too much :)

I think we need to find the best solution here. Extending the original lib you will see as the best solution?

artur-chopikian · 2025-03-23T19:35:47Z

As an alternative solution, it can be something like that, but here we need to use global var if it is ok, we go with this as well. If we need concurrency we can add the mutex.

package excelize

import (
	"bytes"
	"encoding/xml"
)

var xmlEncoder = newEncoder()

type encoder struct {
	*xml.Encoder

	buf bytes.Buffer
}

func newEncoder() *encoder {
	e := new(encoder)
	e.Encoder = xml.NewEncoder(&e.buf)
	return e
}

func (x *encoder) encode(str string) string {
	if str == "" {
		return ""
	}

	_ = x.EncodeToken(xml.CharData(str))
	_ = x.Flush()

	defer x.buf.Reset()

	return x.buf.String()
}

xuri · 2025-03-24T05:00:48Z

Hi @artur-chopikian, I think using following changes would be better approach:

-var buf bytes.Buffer
-enc := xml.NewEncoder(&buf)
-_ = enc.EncodeToken(xml.CharData(value))
-enc.Flush()
-value = buf.String()
+var buf strings.Builder
+_ = xml.EscapeText(&buf, []byte(value))
+value = strings.ReplaceAll(buf.String(), "&#xA;", "\n")

xuri

Hi @artur-chopikian, please using strings.Builder instead of bytes.Buffer to get better performance.

xuri

LGTM, thanks for your contribution.

replace xml.NewEncoder with xml.EscapeText

547fe7a

artur-chopikian mentioned this pull request Mar 6, 2025

Tab (\t) character cannot be displayed in the generated workbook's cell #1865

Closed

xuri added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 7, 2025

xuri requested changes Mar 7, 2025

View reviewed changes

copy a modified method from a standard library

34c4f96

change if we have preserved space, then do escaping

aed33b4

artur-chopikian requested a review from xuri March 7, 2025 14:47

xuri requested changes Mar 7, 2025

View reviewed changes

artur-chopikian added 2 commits March 8, 2025 15:49

copy modified xml.EscapeText

ab3f3a9

add tests

1442644

artur-chopikian requested a review from xuri March 10, 2025 21:10

xuri requested changes Mar 13, 2025

View reviewed changes

xuri added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 14, 2025

artur-chopikian added 2 commits March 24, 2025 10:02

replace new line characters

1cdde20

remove comment

2f38c7a

artur-chopikian requested a review from xuri March 24, 2025 08:40

xuri approved these changes Mar 24, 2025

View reviewed changes

xuri added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 24, 2025

use strings builder

271c282

artur-chopikian requested a review from xuri March 24, 2025 17:41

xuri approved these changes Mar 25, 2025

View reviewed changes

xuri merged commit 91d36cc into qax-os:master Mar 25, 2025
17 checks passed

Uh oh!

Conversation

artur-chopikian commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Details

Description

Types of changes

Checklist

Uh oh!

artur-chopikian commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

xuri left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artur-chopikian commented Mar 7, 2025

Uh oh!

artur-chopikian commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuri left a comment

Choose a reason for hiding this comment

Uh oh!

artur-chopikian commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuri commented Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuri left a comment

Choose a reason for hiding this comment

Uh oh!

artur-chopikian commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artur-chopikian commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuri commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuri left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

artur-chopikian commented Mar 6, 2025 •

edited

Loading

artur-chopikian commented Mar 6, 2025 •

edited

Loading

codecov bot commented Mar 6, 2025 •

edited

Loading

xuri left a comment •

edited

Loading

artur-chopikian commented Mar 7, 2025 •

edited

Loading

artur-chopikian commented Mar 7, 2025 •

edited

Loading

xuri commented Mar 8, 2025 •

edited

Loading

artur-chopikian commented Mar 21, 2025 •

edited

Loading

artur-chopikian commented Mar 23, 2025 •

edited

Loading

xuri commented Mar 24, 2025 •

edited

Loading

xuri left a comment •

edited

Loading